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Abstract: In Markov chain Monte Carlo (MCMC) sampling considerable 
thought goes into constructing random transitions. But those transitions 
arc almost always driven by a simulated IID sequence. Recently it has been 
shown that replacing an IID sequence by a weakly completely uniformly 
distributed (WCUD) sequence leads to consistent estimation in finite state 
spaces. Unfortunately, few WCUD sequences are known. This paper gives 
general methods for proving that a sequence is WCUD, shows that some 
specific sequences are WCUD, and shows that certain operations on WCUD 
sequences yield new WCUD sequences. A numerical example on a 42 dimen- 
sional continuous Gibbs sampler found that some WCUD inputs sequences 
produced variance reductions ranging from tens to hundreds for posterior 
means of the parameters, compared to IID inputs. 
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1. Introduction 

In Markov chain Monte Carlo (MCMC) sampling, random inputs arc used 
to determine a sequence of proposals and, in some versions, their acceptance 
or rejection. Considerable effort and creativity have been applied to devis- 
ing good proposal mechanisms. Many examples can be found in recent books 
such as Newman and Barkema (1999), Liu (2001), Robert and CascUa (2004), 
and Landau and Binder (2005). 

With very few exceptions, described below, the proposals and acceptance- 
rejection decisions are all sampled the same way. A sequence of points Ui £ (0, 1) 
that simulate independent draws from the [/[0, 1] distribution is used to drive 
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the process. By replacing IID draws with more balanced samples, we can hope 
to improve the accuracy of MCMC. 

MCMC sampling is subtle and modifying the IID driving sequence with- 
out theoretical support is risky. To do so is to simulate a Markov chain us- 
ing random variables that do not have the Markov property. Caution is in 
order and few have tried it. According to Charles Geyer, writing in 2003, 
(http : //www . Stat .umn. edu/geyer/mcmc/talk/mcmc .pdf) "Every MCMC-like 
method is either a special case of the Metropolis-Hastings-Green algorithm, or is 
bogus" . Our objective in this paper is to show that a variety of alternative sam- 
pling schemes avoid the latter category. They yield consistent MCMC estimates 
whose accuracy we investigate empirically. 

Two exceptions to IID driving sequences that we know of are Liao (1998) 
and Chaudary (2004). The first proposed using randomly reordered quasi-Monte 
Carlo points in the Gibbs sampler. The second proposed strategic sampling of 
proposals with weighting of rejected proposals. Both found empirical evidence 
of a modest improvement, but neither gave any theory to support his method. 

We will follow up on Liao's proposal, which is illustrated in Figure 1. The 
quasi-Monte Carlo points, which are rows of the matrix A are very uniformly 
distributed through [0,1]^. They get reordered into a matrix X that then has 
its rows concatenated into a long vector u that gets used to drive the MCMC 
computation. We refer to both u and X as the driving sequence. 

Recently Owen and Tribble (2005) proved consistency for Metropolis-Hastings 
algorithms, including Gibbs sampling as a special case, when the driving se- 
quence Ui are completely uniformly distributed (CUD), or even weakly CUD 
(WCUD), as defined below. That work built on Chentsov (1967) who gave con- 
ditions under which CUD sequences lead to consistency when Markov chains 
are sampled by inversion. 

We give formal definitions of CUD and WCUD sequences below. For now, 
note that a CUD sequence is one that can be broken into overlapping d-tuples 
that pass a uniformity test, and that this holds for all d. Replacing IID points by 
CUD points is similar to using the entire period of a random number generator. 
When the points of the CUD sequence are well balanced, the effect is to get a 
quasi-Monte Carlo (QMC) version of MCMC. One then has to design or choose 
random number generators with good uniformity that are small enough to use 
in their entirety. 

The theory in Owen and Tribble (2005) applied to infinite sequences, but the 
examples there used finite sequences that were not yet known to be WCUD. This 
paper establishes that several classes of constructions give WCUD sequences and 
shows how certain natural operations on WCUD sequences yield other WCUD 
sequences. We then illustrate the use of WCUD sampling on a 42 parameter 
Gibbs sampling problem and find that the posterior means are estimated with 
variance reductions ranging from tens to hundreds. A detailed outline of this 
paper follows. 

Section 2 provides background material on MCMC, QMC, and CUD se- 
quences. In Section 3 a new definition of triangular array (W)CUD sequences is 
made, suitable for QMC constructions that are not initial segments of infinite 
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Fig 1. This figure illustrates the construction used in Liao (1998). The matrix A £ [0, l]"'*" 
has its rows randomly reordered by a permutation t, creating the matrix X g [0,1]"'^*. The 
rows of X are then concatenated into one long row vector whose contents are then used to 
define the vector u 6 [0, 1]"". The values of u are then fed into the MCMC algorithm. 

sequences. Theorem 2 shows that triangular array (W)CUD sequences lead to 
consistent MCMC estimates. Sometimes it is simpler to prove a pointwise ver- 
sion of the (W)CUD property instead of the uniform one required. Lemma 1 
shows that pointwise (W)CUD sequences are necessarily (W)CUD. (W)CUD 
sequences arc defined through overlapping blocks of conscctive values, but some 
QMC constructions make it easier to work with non-overlapping blocks. Theo- 
rem 3 shows that a (W)CUD like property defined over non-overlapping blocks 
suffices to prove the original (W)CUD property. That theorem also shows that 
one can restrict attention to any convenient infinite set of dimensions. 
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Next wc turn to specific constructions. In Section 4, Theorem 4 shows that 
a lattice construction of Niederreiter (1977) leads to a CUD sequence. Cranley- 
Patterson rotations are commonly used to randomize lattice rules. Lemma 2 
shows that Cranley-Patterson rotations of Niederreiter's sequence, or indeed 
of any (W)CUD sequence, are WCUD. Section 5 investigates Liao's proposal 
for randomly reordering the points of a QMC sequence. Lemma 3 gives a non- 
asymptotic bound for the discrepancy of the reordered sequence in terms of 
the discrepancy of the original one. Then Theorem 5 gives conditions for the 
reordered points to be WCUD. Liao's proposal simply shuffles a QMC sequence. 
Theorem 6 shows that some generalizations that rearrange the QMC points are 
also WCUD. Section 6 shows that wc can mix IID J7[0, 1] sequences into WCUD 
sequences in a certain way, and end up with a WCUD sequence. This then 
proves, via a coupling argument, that Liao's proposal for handling acceptance- 
rejection sampling leads to consistent estimates. 

Section 7 presents an example, with a probit model on some data of Finney 
(1947), using QMC-MCMC to drive the Gibbs sampler from Albert and Chib 
(1993). In this example, the parameter vector has 42 dimensions. The variance 
reductions attained are typically in the range from 20 to 40 but some improve- 
ments of over 500-fold were attained. Section 8 summarizes our findings, dis- 
cusses rates of convergence and extensions to continuous state spaces. 

To conclude this introduction, we mention some related work in the litera- 
ture that merges QMC and MCMC ideas. L'Ecuycr et al. (2005) have devel- 
oped an array-RQMC method for simulating Markov chains on an ordered state 
space. Craiu and Lemicux (2007) combine antithetic and stratified sampling in 
the multiple-try Metropolis algorithm of Liu et al. (2000). James Propp and 
co-workers have been applying QMC ideas to derandomize some randomized 
automata. At present the best way to find this work is via Internet search using 
the term "rotor-router" . Lemieux and Sidorsky (2005) use quasi-Monte Carlo 
sampling to drive the exact sampling of Propp and Wilson (1996). 

2. Background 

This section sets out the notation for MCMC, assuming some familiarity with 
Markov chain Monte Carlo methods, as described for example in Liu (2001), 
Robert and Casella (2004) or Gilks et al. (1996). Our version works on a finite 
state space with Metropohs-Hastings type proposals. 

Then we describe quasi-Monte Carlo sampling (QMC) and use it to define 
CUD and weakly CUD sequences. Finally we show how CUD and weakly CUD 
sequences can be used as driving sequences for Metropolis-Hastings sampling. 

We use integers m, s, and d to describe dimensions in this paper. One 
Metropolis-Hastings step typically consumes m uniform random numbers. A 
quasi-Monte Carlo point set is typically constructed for a finite dimension s. It 
is most natural to arrange matters so that m = s, but we need two distinct inte- 
gers because m 7^ s is also workable. Finally we will need some equidistribution 
properties to hold for all d in an infinite set of dimensions. 
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Points in [0, 1]™, [0, 1]'', or [0, l]** arc denoted by letters such as x and z, or 
when there are many such points, by .t*-'-* or z^*^' for integer indices. Components 
of such tuples are denoted as Xj or z^^\ For j < k, the notation xj-k denotes 
the (fc — J + l)-tuple taken from components j through k inclusive of x. 



2.1. Markov chain Monte Carlo 




We describe MCMC for sampling from a distribution tt on a discrete state space 
= {(jji, . . . , ijJk} for K < oo. We suppose as is usual that the ratio tt{lu)/tt{u!') 
can be easily obtained for any pair 6 fi, or at least for any pair that 

can arise as two consecutive samples. Starting with some point a"'"' G fi, the 
MCMC simulation generates a Markov chain tj^'' G 51 for i > 1 whose stationary 
distribution is tt. 

Common usage of Metropolis-Hastings sampling make a proposal based 
on cD*^'' and m — 1 consecutive uj from the driving sequence: 

w^'+i) =*(tj(*),w™+i,...,u™+™_i). (1) 

Acceptance or rejection of the proposal is based on one more member of the 
driving sequence as follows: 

else, ^ ' 

using the Metropolis-Hastings acceptance probability 

~N . 7r(a;)p(a3 — > w) \ ,„. 

A{uj^uj)=mm(l,^^ J-) . 3 

V Tr[to)p(UJ ^ UJ)/ 

Here p{uj — > uj) denotes the probability of proposing a transition from lo to lo. 

The mechanism described by equations (1), (2), and (3) is less general than 
the usual MCMC. It leaves out the case where acceptance-rejection sampling 
is used to generate one or more of the components of the proposal. Section 6 
shows how one can splice in HD elements of the driving sequence for that case. 

The function ^ that constructs proposals may involve inversion of CDFs 
or apply other similar transformations for generating random variables given 
by Devroye (1986). We will however need a regularity condition (Jordan mea- 
surability) for the proposals. This very mild condition rules out pathological 
constructions. 

Definition 1 (Regular proposals). The proposals are regular if for allto^uj G O 
the set 

S~ = {(Ml, • • • , Wrn-l) I W = «'(W, Ui, . . . , U„,_l)} C [0, 1]"-^ 



is Jordan measurable. 
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Jordan mcasurability of 'S'^^~ means that it has a Ricmann integrablc indi- 
cator function. The boundary of such sets has volume zero. Proposals that do 
one thing for Ui with rational components and another for irrational u,; might 
be ruled out. Practically useful proposal mechanisms are regular. 

By stage n the fraction of time spent in state uj is 

1 " 

7r„(tj) = - Vl^m^^. (4) 
n ^ — ' 

i=l 

Consistency of 7r„ is defined differently for random and nonrandom Ui, as follows. 
Definition 2. The chain is consistent if 

lim 7r„(ijj) = 7r(cj) (5) 

n — 'oo 

holds for all tjjtj^'^^ S fl. 

Definition 3. The chain is weakly consistent if 

lim Pr(|7r„(a') - 7r(w)| > e) = (6) 

n — ^oo 

holds for all uj,uj'^^') G fl and for all e > 0. 



2.2. Quasi-Monte Carlo 

In quasi-Montc Carlo sampling, one docs not simulate randomness. Instead one 
picks points that arc more uniform than random points would be. Here we sketch 
QMC sampling. The reader seeking more information may turn to Nicderrcitcr 
(1992). 

For a point z = (zi, . . . , Zs) S [0, 1]'* let [0, z] denote the s dimensional box 
bounded by (0, . . . , 0) at the lower left and z at the upper right. That is [0, z] 
is the Cartesian product 0^=1 [Oj^j]- The volume of this box is Vol([0, z]) = 
Y[j=i ^j- Given n points a;'"'^', . . . G [0, 1]'' define the empirical volume of 

the box as 

2 " 

Vol„([0,2;]) = - ^lx('>e[o,z] 
1=1 

and the local discrepancy function 

5^(z) = <5^(z;x(i), . . . , = |Vol„([0, z]) ~ Vol([0, z])\. (7) 

We suppress the dependence of Vol on s and x^^^ to keep notation uncluttered. 
The star discrepancy of x^^^ , . . . , x^") G [0, 1]* is 

AT = Ar(^^'^■•■,^^"^)= sup 5„(z;xW,...,:r(")). (8) 

ze[o,i] = 
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The star discrepancy is an s dimensional generalization of the Kolmogorov- 
Smirnov distance between J7[0, 1] '' and the empirical distribution of the points x^'^ 
Let the function / : [0,1]" R have total variation ||/||hk in the sense 
of Hardy and Krause. For s = 1, ||/||hk reduces to the usual concept of the 
total variation of a function. For s > 1 one must take care to use a measure of 
variation that does not vanish when / depends only on a subset of its arguments. 
Variation in the sense of Hardy and Krause does so, as described in Owen (2005). 
Then the Koksma-Hlawka inequality 

< A? X II/IIhk (9) 

shows the advantage of low discrepancy for integration. There are QMC con- 
structions for which D** = 0{n~^^'^) holds for any e > 0. Then functions / of 
finite variation can be integrated with an error rate that is superior to the famil- 
iar 0[n~^/^) root mean square error rate of Monte Carlo. The asymptotics can 
be slow to set in but in practice the accuracy of QMC ranges from comparable 
with MC to far superior to MC. 

2.3. CUD and weakly CUD sequences 

Quasi-Monte Carlo points will not always lead to the right answer when used 
to drive MCMC sampling. The QMC constructions that can be made to work 
are the ones that are completely uniformly distributed (CUD) and weakly CUD 
(WCUD) as defined below. 

Definition 4 (CUD). The infinite sequence Ui G [0, 1] for i > 1 is completely 
uniformly distributed, if 

lim (mi, lid), (m2, ■•■ ,Wd+l), ... (w„, ... ,Md-(-n^l)) =0 (10) 

n — 'oo V / 

holds for every integer d > 1. 

Knuth (1998) describes several working definitions of randomness for random 
number generators. One of them is that the sequence be CUD. The concept of 
CUD sequences originated with Korobov (1948). Levin (1999) gives a recent 
survey of CUD constructions. 

Definition 5 (Weak CUD). The infinite sequence of random variables Ui G [0, 1] 
for i > 1 is weakly completely uniformly distributed, if 

Jim^Pr(^-D*''(^(Mi, . . . ,Ud), (u2, ■ • ■ ,Ud+i), • ■ • ("«, ■ • ■ ^Wd+n-i)) > = (11) 

holds for every integer d > I and every e > 0. 

Theorem 1. Let tj'"' G 51 = {^i, . . . ,u}k}- For i > let be a regular 

proposal generated from {umi+i, ■ ■ ■ , ?^mi+m-i) andto^^'' via equation (1), and let 



r 1 

/ f{x)dx--J2fi^^'^) 
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determined from Uis by equation (2). Assume that weak consistency (6) 
holds for all djjtj^*'-' E and all e > when Ui are independent ?7[0, 1] random 
variables. If the IID Ui are replaced by weakly CUD Ui, then the weak consistency 
result (6) still holds. If the IID Ui are replaced by CUD Ui, then the stronger 
consistency result (5) holds. 

Proof. Owen and Tribble (2005), Theorem 3. □ 

Theorem 1 does not exphcitly make any assumptions about whether the 
chain is ergodic or even whether it is irreducible. Those considerations are very 
important, but they are buried in the weak consistency assumption (6). If a 
proposal mechanism is known to be consistent for an IID driving sequence, 
then it remains so for driving sequences that are CUD or WCUD. Therefore 
MCMC sampling separates into two problems. One is selecting a good proposal 
mechanism. The other is selecting a weakly CUD driving sequence, where of 
course IID points constitute the usual choice. 

3. Extensions of WCUD 

This section presents some basic results for (W)CUD sequences that we use later 
to show that specific constructions give rise to consistent MCMC sampling. We 
define a triangular array notion of (W)CUD sequences for use with finite driving 
sequences. 

Theorem 1, taken from Owen and Tribble (2005) applies to infinite sequences 
Ui. In practice one uses a finite sequence of length N . A theory for large TV has to 
account for the fact that the constructions are not always nested: for A^i < N2 
the sequence of length A^2 might not be an extension of the sequence of A^i 
points. 

To formulate our limit, we take a triangular array in which the row indexed 
by N has points wjv.i, ■ • ■ , wat^tv £ [0, 1] that we use to compute tt. The value N 
belongs to an infinitely large nonrandom set TV of positive integers. In terms of 
Figure 1, the bottom row has N = ns numbers in [0, 1]. Asymptotics for Liao's 
construction involve a limit of such figures indexed by N . Other constructions 
similarly involve limits of finite sequences. 

The number n of transitions made depends on N . When each transition 
consumes m points of the driving sequence, then n{N) — \N/m\. The estimate 
using um,i, ■ • ■ , utv.tv is T^n{N){.'^)- Consistency or weak consistency holds when 
for all w, 7r„(7v)('^) converges, or converges weakly, to 7r(a;) as iV — s- 00. Limits 
as A'' — *• 00 are always understood to be through the values N £ JV. 

Definition 6 (Triangular array (W)CUD). Let ujv.i, • . • , mat, at G [0, 1] for an 
infinite set of positive integers N e Af. Suppose that as N ^ 00 through the 
values in J\f , that 

^*N-d+l{{'^N,l, ■ ■ ■ , UN,d), ■ ■ ■ , {uN,N-d+l, ■ ■ ■ , Un^n)) — > 
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holds for any integer d > 1. Then the triangular array {uN,i) is CUD. If the 
UN,i are random and 

Pr{D}f_^^j^{{uN,i, . . . , UN,d), ■ • ■ , (ujv,jv-d+i, • ■ • , un^n)) > e) ^ 

as N ~> oo through values in J\f holds for all integers d > 1 and all e > 0, then 
the triangular array (uN,i) is weakly CUD. 

If an infinite sequence ui,U2,... is CUD (or WCUD) then the triangular 
array of prefixes taking u^^i = Ui, for aU > 1 is also CUD (respectively 
WCUD). This means that the triangular array definitions are broader than the 
original ones. 

Theorem 2. Suppose that the transitions are as described in Theorem 1, in- 
cluding weak consistency (6) when Ui are independent [/[0, 1] random variables. 
Let each transition consume m of the Ui . If ui are replaced by elements UN,i of 
a CUD triangular array then 

lim 7rLAr/,„j(cj) = 7r(w). (12) 

// Ui are replaced by elements uj\j i of a WCUD triangular array then 

lim Pr(|^LAf/„j (w) - tt{uj)\ > e) = 0. (13) 

Proof. The proof is similar to that of Theorem 3 in Owen and Tribble (2005), 
and so we only sketch it. Fix e > and identify the set %{e) C [0,1]'"™ for 
which X^c^en ~ > e) > e holds when (ui, . . . ,Urm,) £ For 

large enough r the set Tr{e) has volume no more than e. Then apply Definition 6 
using d = rm. The rest of the proof follows as in Owen and Tribble (2005). □ 

In proving that converges to zero, the natural first step is to show that 
the local discrepancy 5'^{z) tends to zero at each z. It turns out that such a 
pointwise (W)CUD property implies the (W)CUD property. 

Definition 7 (Pointwise (W)CUD). The triangular array UN,i G [0, 1] is point- 
wise CUD, if 

lim (5^_rf , i( z; (uAr_i, . . . ,MAr,d), . . . , (wAT.AT-d+i, • ■ ■ ,ujv,jv) ) = (14) 

holds for every integer d > 1 and every z G [0, 1]''. The triangular array of 
random variables UN,i G [0, 1] is pointwise weakly CUD, if 

\\m^VY{5%_^_^i(^z-, {uN,i, . . ■,UN,d), ■ • ■ , {uN,N-d+i, ■ ■ • ,iijv,jv)) > = 

(15) 

holds for every integer d > 1, every z G [0, 1]'', and every e > 0. 
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Lemma 1. // (14) holds for all z G [0, 1]'' then 

^*N-d+l {y^N,!, ■ ■ ■ , UN,d), ■ • ■ , iuN,N-d+l, • ■ • , UN^^fj 0, 

as N ^oo. If (15) holds for all z e [0, l]'^ then 

Pl-(^D*j^_^l_^_-i^(^{uN,l, ■ . • ,M7V,d), • ■ • , (uN^N-d+l, ■ • ■ ,1tAr,w)) > ^ 0. 

If UN.i ciT^e pointwise CUD then they are CUD. If UN,i are pointwise WCUD 
then they are WCUD. 

Proof. The final two statements follow from the first two which we prove here. 
Pick e > and then choose a positive integer M > 1/e. Next let £ be the lattice 
of points in [0, 1]'' whose coordinates are integer multiples of l/{2dM). 

For any z € [0,1]^^ we may choose z' , z" G C such that the following hold 
componentwise: z' < z < z", \z ~ z'\ < e/{2d), and \z ~ z"\ < e/{2d). Then 

< Vol([0, z"]) - Vol([0, z]) < e/2. 

Next, for N > d, let Vol([0, z]) denote the fraction of the N — d + 1 points 
{uN,i, ■ ■ ■ ,UN.i+d-i) that are in [0, z]. Then 

Voi([0, z]) - Vol([0, z]) < Voi([0, z"]) - Vol([0, z]) < e/2 + 

and with a similarly obtained lower bound via z', we get D*/ < e/2 + 
maxj,g£ i5^_(i_|_i(y)- The CUD case follows via (14) with e ^ 0. Taking thresh- 
old e/2 in (15) yields Pr(maxj/g£ (5^_^_^i(?/) > e/2) ^ 0, proving the WCUD 
case. □ 

The (W)CUD properties are defined in terms of consecutive blocks of ob- 
servations that overlap, at least when d > 1. It is often useful to consider 
non-overlapping blocks of points such as (ui, . . . , Ud), (ud+i, • ■ • , U2d), ■ ■ ■ ■ For 
infinite deterministic sequences it is known (Kmith, 1998, page 155) that if the 
discrepancy of overlapping sequences tends to zero for all d then the discrepancy 
of non-overlapping sequences also tends to zero for all d. The converse (with 'for 
all d' in both clauses) also holds. See Chcntsov (1967). 

We need sufficiency of non-overlapping block results for the random case as 
well. Also it is helpful to be able to work with only a convenient subset of 
dimensions d. Theorem 3 below shows that such special cases are sufficient to 
prove the WCUD property and hence consistency. 

Theorem 3. Let J\f be an infinite set of nonnegative integer sample sizes 
and let T) be an infinite set of nonnegative integer dimensions. Let UM,i be 
a triangular array for i = 1,...,N and N e TV". For integer d > 1 define 
the nonoverlapping d-tuples x^'^^ ~ x^'^^{d,N) — {u^~^^ i)+v ' ' ^'^Ndi) ■^'^^ 

1 — 1, . . . , M ~ M{N, d) = lN/d\ . For integer d define the ordinary d-tuples 

= x(') (d, N) = {UN,^, . . . , UN,^+d-l) fori^l,...,N-d+l. 
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Suppose that 



Van Pi{Dll{x^^\. . .,x^^^^) > e) = (16) 



holds for all d eV and all e > where M M {N, d) . Then 

lim Pr«_,+i(zW,...,x(^-'^+i))>6)=0 (17) 

A' — »oo 

holds for all e > and all integers d > 1, so that u^^i are WCUD. 

Proof. Let e > and > 0, let d be a positive integer, and suppose that 
z e [0, 1]''. Choose d £ V with d/d < e/3. Because N is tending to infinity, we 
may assume that d/N < e/3. 

For i = 1, . . . , N — d+1 let x^'^ = (ujv,j:, ■ • ■ ,UN,i+d-i) and for fc = 1, . . . , M = 
\N/d\ let x'^'^^ = {u^~,, ). Most of the x^'^ are nested within 

' -J ^ N,d[k—l) + l^ ' N,dk' 

exactly one of the x'^^^ as follows. For i — 1, . . . , N ~ d + \. define k ~ k{i) by 
(fc - l)d+ l<i< fed and define e{i) by £^i-{k- l)d. If k{i) < M{N, d) and 
1 < ^ < d — d+ 1 then the components ujv,i, • ■ • , uj^^ij^d-i of x*-'-* are in positions 
£ through ^ + d - 1 of x^^\ That is x^"^ = x^^(f,^d_^y 



Now 



7V-rf+l U d-d+1 

El^(i)prn ,1 < 1 ~ +Md + d~l 



2,((fe-l)d+f)gro ,1 
i=l k=\ l=\ 

M 

<Y.f{xu) + '2eNl?,, 

k=l 

where f{x) = ^7,.,^,+^_,^e[o,^]■ ^he integral of / over [0, l]*^ is (d - d + 

l)Vol([0, z]). The function / is piecewise constant within a finite set of axis 
parallel hyperrectangular regions in [0, 1]'*. It follows that for some K < oo 
\M-' Efii fi^k) ~{d-d+ l)Vol([0, z])| < KDj{x(^\. . . , 

Therefore for small enough el^ > having D'^j < e+ will imply that {N — d + 
Si!=T'*^^ ^x(^^£la.z] < Vol([0, z]) + e. Similarly for small enough e _ > hav- 
ing D*J < ?_ wiU imply that {N -d+1)-^ Efji"^' l.We[o,.] > Vol([0, z]) - e. 
Therefore when < 7 ~ min(e+,?_) we have Sff_j^_^_i{z) < e. By equa- 
tion (17) we can choose N & J\f large enough that Pr((5^_^^^(z) > e) < ?/. 
Because z, e, and 77 are arbitrary we have shown that u^^i are pointwise weakly 
CUD. To complete the proof we apply Lemma 1. □ 

4. Lattice constructions 

Niederreiter (1977) gives a result that shows how lattice rules may be used 
to construct a triangular array that is CUD. Let TV be a prime number. Let 
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uq ~ l/N and for i > 1 let Ui ~ aui^i/N mod 1 where a is a primitive root 
modulo N . For integer dimensions s > 1, there are — 1 distinct consecutive 
s-tuples in this sequence; call them x^'^ = (u^, . . . ^Ui+s-i)- Niederreiter (1977) 
shows that for well chosen a = a(A^) that 

(18) 

holds, where (p is Euler's totient function. 

The totient function 4){n) counts the number of positive integers less than 
or equal to n that are relatively prime to n. The totient function grows quickly 
enough for our purposes because 

(hiTi) 

lim inf log(log(n)) = exp(— 7), 

n — ^00 n 

where 7 = 0.5772 is the Euler-Mascheroni constant. As a result there exists an 
A < 00 and an TVq < 00 such that 

i?;f_i(x(i),...,a;(^-i)) < ^log(log(7V))(log7V)^ (19) 

holds uniformly in s > 1 and TV > Nq. The constant A in (19) does not have 
to grow exponentially with s because the factor 2/7r in equation (18) is smaller 
than 1. 

Theorem 4. Let M he an infinite set of prime numbers. Let s{N) he a nonde- 
creasing integer function of N € Af satisfying s{N) = o([log(A'^)/ log(log(iV))]") 
for some positive a < 1. For each N £ Af let a{N) be a primitive root modulo 
N for which (18) holds. Form a triangular array via uat.i = a{N)/N mod 1 
and ui\[,i = auN^i-i/N mod 1 for i = 2, . . . , N — 1. Then the triangular array 
(uN.i) is CUD. 

Proof. For any d>l choose Nd with s{Nd) > d. Then for all N > Nd we have 
D'pf^i smaller than the right side of (19). Now the growth condition on s makes 
i^^li ^0. □ 

Owen and Tribble (2005) employed a method of running through the lattice 
rule more than once, so as to use all A^ — 1 of the s-tuples in it exactly once. That 
work also prepends s zeros to the sequence. The result is that n{N) = N, and 
the set of s-tuples used to drive the MCMC form a lattice rule Sloan and Joe 
(1994) in [0,1]*. The lattice rule structure is much more balanced than ran- 
dom Ui would be and this accounts for most of the improved accuracy seen 
there. Prepending one single s-tuple will not affect the CUD property of over- 
lapping d-tuples for any d > 1. Similarly, shifting the points from ui, . . . , to 
itfc+i, . . . , UN^ Ml, • ■ • , Uk-i for any finite k does not destroy the CUD property 
(Clientsov (1967)). Finally concatenating a finite number of CUD triangular 
arrays with the same iV ^ 00 yields a CUD result. 



S. D. Tribble and A. B. Owen/Weakly CUD sequences for MCMC 



646 



Lattice rules are commonly randomized via a rotation due to Cranley and 
Patterson (1976). Let a £ [0,1]" and suppose that U ^ J7[0, 1]-'. Then the 
Cranley-Patterson rotation of a is the point x = a+U mod 1 under component- 
wise arithmetic. Whatever value a has, the point x is uniformly distributed on 
[0, 1]*. A Cranley-Patterson rotation applied to all points in a s-dimensional 
low discrepancy lattice yields a shifted lattice with points that are individually 
U[0, lY while collectively having low discrepancy. 

In terms of Figure 1, this proposal starts with the matrix X, skipping the 
randomization of A. The matrix X has Os in its first row and then all other 
s-tuples of the lattice obtained in order with possibly multiple passes. Then a 
Cranley-Patterson rotation is applied to each row of the matrix. Finally, the 
concatenation into a single u vector works as in Figure 1. 

Owen and Tribble (2005) applied a single Cranley-Patterson rotation to all of 
the s-tuples {urs-s+i, ■ ■ ■ , Urs) {r = 1, . . . ,N) in the MCMC driving sequence. 
As we show next, applying a Cranley-Patterson rotation to any CUD or WCUD 
triangular array yields a triangular array that is WCUD. 

Lemma 2. Let uj^^i G [0, 1] for i = 1, . . . , TV and N in an infinite set J\f of 
nonnegative integers. Define v^^i = UM,i + C^j(i) mod 1 where j{i) = 1 + (i — 1 
mod m), for integer m > 1. If una are (W)CUD and {Ui, . . . , Um) ^ f/[0, 1]™ 
independently ofuM,i, thenv^^i are WCUD. 

Proof. Suppose that u^a are WCUD, and let z G [0,1]'' where d — rm for 
integer r > 1. Let z''' = {vN,di-d+i, ■ ■ ■ TVN,di) & [0,1]'' be the i'th d di- 
mensional point taken from the rotated triangular array {vM,i). Let x'^^^ = 
(wjv,di-(i+i, • • ■ ,UN,di) be thc prc-imagc of z''-* before Cranley-Patterson rota- 
tion was applied. Then z^*) G [0, z] if and only if a;^*'' G B where B = B{z, U) 
is the union of up to 2'^ axis parallel rectangular boxes in [0, 1]''. Therefore the 
local discrepancy satisfies 5'^_^j^^{z) < KD*jf_^^^{x''^\ . . . ,2;^^^'^+^)) for some 
X < cxo. It follows that Pr((5^_^^j^(z) > e) ^ for any z and any e > 0. 
Therefore Pr(_D*'' > e) — + for any d that is a multiple of to. Therefore VN,i 
is WCUD. If the ujv.i are CUD they are also WCUD and so then VN,i are still 
WCUD. ' ' □ 

5. Consistency of Liao's proposal 

Let a^^\ . . . , a^"-' be points in [0, 1]*. In Liao's proposal these points are of low s 
dimensional discrepancy. Let where r is a uniform random permu- 

tation of 1, ... , n. Liao's proposal is to concatenate the points a;*-*^ into a driving 
sequence for MCMC. We need to show that those points are WCUD. It is very 
natural to make the dimension s of the reordered QMC points equal the number 
TO of driving points consumed by one transition of the chain. It is not however 
necessary to use to = s as we show below. 

Let ui, . . . , Usn G [0, 1] be the components of the points x*-'-* concatenated. 
The point Ui comes from the [i/s]'th point in the sequence, specifically 
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Table 1 

A sequence of 4 dimensional points , represented in the top row, is concatenated into a 
sequence of scalar s m, represented in the middle row. Those scalars are then regrouped into 
the 7 dimensional points 2^*^ as shown in the bottom row. 

7m 1 7m 1 7m 1 ~. 



Ui U2 U'i Ui Us UQ Uf Us Ug Uio "11 "12 «13 "14 

7m \ 7m 



Now let z^^\ . . . , z^™-* be the points Ui regrouped into m = [sn/d\ consecutive 
batches of length d, leaving out the last sn ~ dm of the Ui if m < sn/d. That is 



z 



j - - a;d(t-i)+j-s(r(<i(j-i)+j)/si-i)- \'^^) 

The situation is illustrated in Table 1 for the case with s = 4 dimensional points 
regrouped into d = 7 dimensional points z^'^\ The middle row in Table 1 
corresponds to the last row in Figure 1 and the bottom row of Table 1 shows 
the vectors we need to analyze the d dimensional discrepancy of the driving 
sequence for d 7^ s. 

We will avoid working directly with the rightmost expression in (20) by break- 
ing the a;*-*^ into chunks. To illustrate, consider the point z*^^) in the example of 
Table 1 and let z e [0, 1]^. Then z^^) e [0, z] if and only if 

4''g[0,zi], e [0,Z2:5] and, x[^l e [0 , z^a] (21) 

all hold. The next Lemma takes care of the main details needed to get a bound 
for the discrepancy. 

Lemma 3. For i — 1, . . . ,n, let a^*^ be points in [0, 1]" with star discrepancy at 
most D. Let where t is a uniformly distributed random permutation 

0/1, . . . , n. Fori = 1, . . . , [ns/d] let z*-'-* be obtained as in equation (20). Assume 
that D < 1/3 and that n > 3[(d - l)/s] . Then for any z G [0, 1]^ 

|Pr(z« e [0,z])- Vol{[Q,z\)\ < ^(2^+nd-l)/s^ ^ {j (^o + l^!L_ljj (22) 

Proof. Because the x^*'' are a permuted version of a*-'-* they have the same star 
discrepancy, which is at most D. Furthermore for 1 < j < fc < s the k — j + 1- 
dimensional star discrepancy of x^j^l, ■ ■ ■ , x'^l is at most D. 

The point z*-*' is comprised of chunks taken from consecutive x^^^^s. The 
number of contributing chunks C is between \d/ s] and 1 + \{d — l)/s] inclusive. 
The upper limit is attained if the first component of z^*' is the last component 
of one of the x^^^ and the lower limit is attained if the first component of z^*) is 
the first component of one of the x'^^'' . 

For chunks c = 1, ... ,C let L{c) and U{c) be the first and last indices in 
z^*^ that are taken from chunk c. Let j{i) be the integer such that the first 
component zf^ is taken from x^^\ Then chunk c of z*^'-* is taken from x^^^^^^'^~'^^ 
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for c = 1, . . . , C. Let €(c) and u(c) be the first and last indices in a;^-'^'-'^'^^^-' 
that are used to form the c'th chunk of z^'^\ Tliat is 

Next for 1 < ^ < u < s let Np-uiz) = ^ 1 (o , count the number of 

a::^*) whose i : m subcomponents are in the box [0, z^.y]- 

To streamline notation we write for the c'th chunk x^p(^^^y^l^^^ ^\ for 

the c'th box [0, Zi(c):c/(c)]i for Vol(i?c), and A^c for -^f(c):u(c)(-z). From the 
discrepancy bounds we know that 

n{vc - D) < Nc< n{vc + D). 

Now Pr(z('^ G [0, z]) is the product of C conditional inclusion probabilities: 

c _ _ 

Y[ Pr(x, G I e 1 < c' < c) . (23) 

What is random in (23) is the selection, by simple random sampling, of which 
a'-*'^ will become x^^^'^'^^'^^^K The conditional probability for chunk c is between 
max((A^c — c + 1) /{n — c + 1), 0) and min(A^c/ ('^ ^ c + 1), 1) depending on how 
many of the suitable a'-'^-' were 'used up' for chunks 1, . . . , c — 1. Clipping these 
bounds to and 1 is only necessary to handle some extreme cases. We use the 
notation to denote max{y,0). 
For the lower bound, 

Pr(.We[o,z])>n ^^'~^^^^^ 



c=l 
C 



>Vol{[0,z])-(2^ -1)(d+^^]. (24) 



The last step in (24) is quite conservative. It follows from an expansion of 
the previous line into 2*-^ terms of which one is Vol([0, z]) and the others have 
alternating signs and are all of smaller magnitude than D + {C — l)/n. The 
upper bound on D and lower bound on n suffice to give Z? + (C — l)/?i < 1 so 
that the largest terms are not (D + (C — l)/n))^ . When D + {C — l)/n is very 
small then the quantity 2*^ — 1 can be replaced by one almost as small as C. 
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For the upper bound, 

C 



Pr(.«e[o,z])<n^7T9TT^n 

c— 1 

c 



- . - n-C+l 

^ ' ' Vr+D 



c=l 



1 - {C-l)/n 



<Voia0,.]) + (2--l)^^ax(^-|±^-.. 

< Vol([0, z]) + \{2^~ 1) [d + ^) . (25) 

The result follows by combining (24) and (25), and using the fact that C < 
l+\{d-l)/s]. □ 

Theorem 5. For i = l,...,n, let a*-*' be points in [0,1]" with star discrep- 
ancy at most _D*. Let x'*-* = a^"^^*'^ where t is a uniformly distributed random 
permutation of 1, ... ,n. For i = 1, • ■ • , [ns/dj = n let z^^^ be obtained as in 
equation (20). Suppose that Z?* — s- as n — > oo. Then for any z € [0, 1]'* and 
d>l, 

E{Siiz;z('\...,z(^^)')^0{n-'+D:) (26) 

as n ^ oo, so that for any e > 

Pr{Si{z; z(i), . . . , z(")) > e) = 0{n-' + D*,) (27) 

as n oo. When d — 1, we have the sharper result 

|^(z;zW,...,z("))|<i?*. (28) 

Proof. Let = 1 if z^*) e [0, z] and Y, = Q otherwise and let v = Vol([0, z]). 
Then 



^ n n 



For large enough n both _D* < 1/3 and n > Sd hold. Using Lemma 3 we find 
that \E{Yi) ~ v\ < Xii-D* + Ku/n holds for large enough ??, where Ku < oo 
are constants from Lemma 3. Also, K12 = when d = 1. 

Now suppose that Yi and Yj are well separated in that |i — j|>S'=l + 
\{d — l)/s]. Then none of the points a;*^'^-' contribute chunks to both z'*) and 
z^-*^. Then the same argument used in Lemma 3 can be adapted to show that 
for large enough n, 

I Pr(y,y, = 1) - < K2iD*„ + A'22/n 
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holds for < oo. The argument simply requires studying the combined set of 
chunks for z(') and z'-^K Then n^E{6^z)^) may be expanded and bounded as 
follows: 



< n{2S - 1) + n^{v^ + K2in-^ + K22DI) - 2n^v{v - Kim'^ - K12DI) + n^;^ 
= n{2S - 1 + K2isld + 2Kns/d} + n^D:^{K22 + 2K21). 

Therefore E{Si{z)'^) = 0{n^^ + 13*) as n 00, establishing (26) and hence 
also (27) by Markov's inequality. Finally (28) follows by counting the number 



Corollary 1. Let a^^\ . . . , a*-"-* be points in [0, 1]'* with star discrepancy — > 
as n — > 00. Then the proposal of Liao (1998) is weakly consistent for Metropolis- 
Hastings sampling. 

Proof. Liao's proposal generates pointwise WCUD points by Theorem 5 and 
hence WCUD points by Lemma 1. They are then weakly consistent by Theo- 
rem 1. □ 

The proposal of Liao (1998) leads to local discrepancies 5'^{z) that vanish, but 
are not particularly small except for d = \. Liao's motivating application was 
Gibbs sampling where the number m of variates required for one cycle matches 
the dimension s of the quasi-Monte Carlo points. 

That proposal gets better than Monte Carlo stratification for the Ui indi- 
vidually and for consecutive g~tuples such as {ukm+ri , Ukm+r2 1 ■ • ■ i Ukm+r, ) for 
A: > that nest within consecutive m = s-tuples. The proposal does not get par- 
ticularly good discrepancy even for consecutive pairs (u^, u^+i) because there is 
a jump from the boundaries of the underlying QMC points when i is a multiple 



Suppose that we have a problem for which we want to stratify successive 
updates to the j'th component of uj. We might want roughly the right number 
of low and high proposals for that component and roughly the right probability 
for consecutive pairs or triples of proposals. In such a case, we might wish to 
arrange that s consecutive proposals for the j'th component of w are generated 
from one of the original s dimensional QMC points. 

Similarly we might have a problem in which we wish to treat the acceptance- 
rejection step of Metropolis-Hastings specially. We could then take s = m — 1 
and use one QMC point for each proposal we need and one QMC point for each 
of s consecutive acceptance-rejections. 

Whether such alternative schemes work well depends of course on how well 
the scheme matches the problem. But such schemes can be used to generate 
consistent samples. Each scheme takes the points of a driving sequence, such 
as Liao's proposal, groups them into consecutive blocks of r = ms points, and 




of z(*) < z. 



□ 



of s. 
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applies a fixed permutation witliin each block. That permutation operation pre- 
serves the WCUD property: 

Theorem 6. Let a'*' e [0, 1]* for i ~ 1, . . . ,n have star discrepancy _D** 
as n — > oo. Let a;(*) = a^^W) where t is a random permutation of {1, ... ,n}. 
Let Vi — 2;^L*f(*pi/s]_i) f>e the sequence of x~components for i = 1, . . . , ns. 

For r > 1 let a be an arbitrary permutation of the integers 0, . . . , r — 1. For 

i> \ let Ui ~ Tjj^j-) where 

jii) = r[{i- l)/r\ + G{i - r[(z - l)/rj) 
Then Ui are WCUD. 

Proof. The Vi are the driving sequence proposed by Liao (1998). The Ui are a 
permutation of them in which no element is moved by more than r positions. 
The arguments in Lemma 3 and Theorem 5 go through as before. All that has 
changed is the number and identity of chunks contributing to a consecutive 
d-tuple of UiS. □ 



6. Acceptance-rejection sampling 

It is often impractical, if not impossible, to generate a transition using an a priori 
fixed number of members Ui of the driving sequence. The primary example of 
such a method is acceptance-rejection sampling, which we sketch here to fix 
ideas. 

To sample a real valued y from a probability mass or density function / we 
begin by sampling y from g instead where f{y) < cg{y) holds for all y e M for 
some constant c G [l,c»). Then we accept y with probability f{y)/{cg(y)). If y 
is not accepted then we keep on sampling from g until a point is accepted. 

For an illustration of acceptance-rejection sampling suppose that g has CDF 
G with an efficiently computable inverse G~^. Then to get a sample from / let 

Vjr^U[0,l], IID, j>l 

Vj = G-^{v2j-i) 

J = mm< J >1\ V2j < — 1-^ > 
I cg{yj)) 

and then deliver y = yj*. To use this method one needs to be able to compute 
the functions and f /(eg). More elaborate versions use k > 1 uniformly 
distributed Vj to produce each proposal. 

Liao (1998) proposes to handle acceptance-rejection sampling by using the 
QMC points to make the first two draws from g and the first two acceptance- 
rejection decisions. If the first two points arc both rejected then he suggests 
drawing from an IID [/[0, 1] sequence until a point is accepted before switch- 
ing back to the QMC points. To simplify matters we'll suppose that only one 
acceptance- rejection step is tried with the QMC points, and that we switch 
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to IID points if that one is rejected. Dcrandomized adaptive rejection sampling 
Hormann ct al. (2004) can be used to construct proposals that are accepted with 
probability arbitrarily close to unity, under a generalized concavity assumption 
on the density. 

To continue the illustration, suppose that the proposal uj has 3 components. 
The first two are generated by inversion, each using one point from [0, 1], while 
the third is done by acceptance-rejection sampling. Then the driving sequence 
for the first n transitions can be represented in a tableau as follows: 

Ul U2 U3 U4 (Vii Vi2 Vi3 ■■■ ) Ug 

Ue U7 Us Ug {V21 V22 V23 ' ' ' ) 1*10 

U5i-4 W5i_3 W5i-2 W5i„i {va Vi2 Wi3 " ' ' ) "tn 

U5„_4 U5„_3 UZn-2 U^n-l {Vnl W„2 WnS ' ' ' ) "5n 

The points Ui are WCUD and the points are independent [/[O, f]. The i'th 
row of the table drives the transition from tj''' to o;'*'*"^-'. For the proposal 
u}^'''~^^\ u^i-A generates the first component io'^^^\ u^i-^ generates the second 
component u)2^^\ u^i-2 proposes the third component u)^^^^\ u^i-i is used to 
accept or reject the third component and u^i is used to accept or reject the 
entire proposal w^'^^' as In the event that ut^i-i leads to rejection of 

the third component, then the infinite sequence Wii,Wi2, ... is used to continue 
acceptance- rejection until a third component is generated for the i'th proposal. 

The difficulty with acceptance-rejection sampling is that the set of driving 
points for which a transition from lj to uj' is proposed does not have a fixed 
finite dimension. It is a union of regions whose dimensions depend on the num- 
ber of proposals rejected during the course of acceptance-rejection sampling. 
This variable dimension complicates discrepancy based methods for studying 
the driving sequence. 

The tabulation above suggests a coupling argument. We replace the sequence 
fii, Vi2, Vi3, ... by a single point Vi S [0, 1]. The values Vi are independent U[0, 1] 
random variables such that the random variable finally generated by acceptance- 
rejection would also have been generated by inversion through Vi. If the compo- 
nent is continuously distributed with CDF H, then Vi = H(uj^~^^''). For discrete 

H we let V, = + - where Vi is U[0,l] 

independent of all other driving variables. That is, using acceptance-rejection 
on the third component can be coupled with the use of inversion for the third 
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component, based on the following driving sequence: 

Ul U2 U3 U4 Vl U5 

Uq U-j Us Uq V2 UlQ 

U5i-4 U5i„3 U5i-2 U5i-i Vi Uin 

""571-4 M5n-3 l'5n-2 U5n-1 "^n "Sn 

Liao's padding proposal for acceptance-rejection sampling will work so long 
as inserting an IID U[0, 1] sequence into his permuted points at regular intervals, 
as illustrated above, preserves the WCUD property. This proposal works more 
generally. If we insert an IID U[0, 1] sequence at regular intervals into a CUD or 
an independent WCUD sequence, then the result is a WCUD sequence. Inserting 
IID points increases thej.ength of a finite sequence, so that row N of the original 
sequence becomes row TV of the new sequence. 

Theorem 7. Let VN,i S [Oil] for i ~ 1,...,N and N in an infinite set of 
positive integers Af. Let Wi for i > I be IID U[0, 1]. For integers m > 2 and 
6 e {0, . . . , m - 1} and i 1, . . . , (m + 1) [N/m\ = N, let 



N: 



w^i/„i] i = b mod m, 

VN,i-[(i-b)/m] else. 



Ifv^j are WCUD and independent of Wi, then u~ . are WCUD. 

Proof. Let d = r{m + 1) for integer r > 1 and choose z £ [0, 1]"^. For k = 
1, . . . , \_N/d\ , the d-tuple x^''^ = (u~ • ■ • , u~ has r components from 

w and dr components from VN,i. Let A represent the components from w and 
B represent the components from v^j. Then 



IN/d] lN/d\ 

1 



Niz)= ^ l.(Oe[o..]= 2^ l.Wefo,.^]!-"' 



The z^^ are IID Bernoulli variables taking 1 with probability Vo1([0,zb]) in- 
dependently of z^"*. Therefore Var(A^(z) | wi, . . .) Q as N 00, while 

E{N{z) I ^ Vol([0,Z5])Elf/''Jl^<.,[o,..]- Now Elf/'J l^«,[o,..] 

converges in probability to [iV/dJ Vol([0, z^i]) as — > cx) because VN,i are 
WCUD. 

It follows that Pi"('5|*^y^j (z) > 0) ^ as ^ 00. Invoking Lemma 1 and 
Theorem 3 completes the proof. □ 

If some fixed number /c > 1 of components are to be sampled by acceptance- 
rejection at each step, then we simply apply Theorem 7 k times inserting k 
independent streams of IID [/[0, 1] random variables. 
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Remark 1. It is important that one of the streams in Theorem 1 be IID. 
Merging two independent and WCUD streams does not necessarily produce a 
WCUD result. For example if the two WCUD sequences ui and Vi are indepen- 
dent Cranley-Patterson rotations of the same underlying deterministic sequence, 
then ui,vi,U2,V2, ■ ■ ■ will not be WCUD because every pair of the form {ui, Vi) 
will lie within one or two lines in [0, 1]^. 



7. Example: Probit regression 



In this section we apply a Gibbs sampling scheme developed by Albert and Chib 
(1993) for a probit regression example of Finney (1947). For i = 1,...,39, 
the response Yi € {0, 1} is 1 if the subject exhibited vasoconstriction and 
otherwise. The predictors are = {Xii,Xi2) where the first component is 
the volume of air inspired and the second is the rate at which air is inspired. 
The probit model has Yi = lzi>o where Zi ~ N{Pq + (iiXn + ^2^12, 1) are 
conditionally independent given Xi, . . . , X^ and /3 = (/?o, Pi, 1^2)' ■ The data are 
shown in Figure 2. 

Taking a non-informative prior for (3, the full conditional distribution of (3 
given Zi, . . . , Zn is N {{X' X)'^ X' Z , (X'X)-i), where X is the n by 3 matrix 
with i'th row {l,Xii,Xi2) and Z is the column vector of Zi values. If = 1 
then the full conditional distribution of Zi given (3 and Zj for j ^ i is N{l3o + 
XiiPi+ Xi2P2, 1) truncated to [0, 00). If =0 then full conditional distribution 
is instead truncated to (—00, 0]. 

To run the Gibbs sampler we need only invert the normal CDF to obtain 
the normal and truncated normal full conditionals. We used the SSJ package 
of L'Ecuycr (2008) which includes a high accuracy inverse normal CDF based 
on Blair et al. (1976). 

The dimension of this simulation is to = 42 variables per step. In our lattice 
sampling we let A'^ be a prime number and choose a to be a primitive root 
modulo N. Suppose at first that m and iV — 1 are relatively prime. Then the 
driving sequence we use is obtained by scanning the following matrix from left 
to right and top to bottom, 



/ 





a 

„2m+l 





-,2m+2 



^^{N-2)7n ^{N-2)m+l ^{N -2)m+2 





^m— 1 



JN-l)m- 



\ 



7 



mod N 



and then dividing by N and applying the same Cranley-Patterson rotation to 
each m = 42 dimensional row. Starting in the second row above, the matrix 
above contains elements from a linear congrucntial generator (LCG), with initial 
seed 1. Unlike LCGs, integration lattices contain a point at the origin, introduced 
here via the first row. 
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Vasoconstriction data 
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Fig 2. Vasoconstriction data from Finney (1947) as described in the text. Cases with vaso- 
constriction are solid points. There are two solid points at (1.9,0.95). 



When m and iV — 1 arc not relatively prime, then the simple scheme above 
does not utilize all — 1 m-tuples of the LCG. For the general case let g = 
GCD(m, TV — 1), so 5 = 1 when m and — 1 arc relatively prime. Then the table 
above contains the vector and g identical blocks of & = {N — l)/g rows. We 
multiply the fc'th such block by a'^~-^ (modulo N) in order to get all m-tuples 
of the LCG into the simulation. 

Ideally a should be a good multiplier for a linear congruential generator, so 
that consecutive tuples are nearly equidistributed. If possible a™ should also be 
a good multiplier so that consecutive updates to a given parameter are also well 
equidistributed. 

We applied our technique with values of N and a shown in Table 2. These are 
from L'Ecuyer (1999). We also applied Liao's method on the same problem. Each 
method was repeated independently 300 times in order to estimate the sampling 
variance. Table 3 shows estimated variance reduction factors for the posterior 
means of Pj. The 0.975 point of the ^299. 299 distribution is 1.25. Therefore 
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Table 2 

Shown are the parameters of the LCGs used to drive the Gibbs sampler for the Finney 

(1947) probit model. Five prime numbers N near powers of two are shown. Five 
corresponding primitive roots a from L 'Ecuyer (1999) are listed. In each case g is the 
greatest common divisor of a and N — 1. The simulation goes through g blocks of 
b= (N -l)/g m-tuples (m = 42) from the LOG. 



N 


1021 


2039 


4093 


8191 


16381 


a 


65 


393 


235 


884 


665 


g 


6 


2 


6 


42 


42 


b 


170 


1019 


682 


195 


390 



Table 3 

This table shows variance reduction factors comparing WCUD-MCMC with IID-MCMC. 
There are 5 sample sizes N and three regression parameters f}j . We are estimating the 
posterior mean of f)j and comparing variances of these estimates. The upper block compares 
a Cranley- Patterson rotated LOG to IID sampling. The middle block compares Liao's 
permutation scheme, on the same rotated LCG, to IID sampling. The bottom block 
compares LCG sampling to permutations. Individual entries between 0.8 and 1.25 are not 
statistically significantly different from 1. 



N 


1021 


2039 


4093 


8191 


16381 


LCG 


/3o 


15.9 


29.9 


22.5 


44.4 


37.6 


vs 


I3i 


14.9 


29.7 


23.3 


41.9 


39.1 


IID 




17.1 


27.4 


22.9 


46.1 


35.2 


Liao 




20.0 


17.9 


23.1 


19.0 


19.0 


vs 




18.5 


18.5 


21.7 


19.8 


20.2 


IID 




21.3 


16.6 


24.1 


20.0 


18.5 


LCG 




0.79 


1.67 


0.97 


2.24 


1.98 


vs 


I3i 


0.80 


1.60 


1.07 


2.18 


1.93 


Liao 




0.80 


1.64 


0.95 


2.30 


1.91 



individual ratios between 0.8 and 1.25 should not be considered statistically 
significant. 

Some trends are clear in Table 3. The variance reductions for all three re- 
gression coefficients track each other very closely. Liao's method typically gives 
a variance reduction of about 20 fold. The LCG method gives a variance reduc- 
tion that tends to increase with sample size but is not perfectly monotone. The 
quality of the underlying lattices may not be monotone in N. For larger N the 
LCG approach performed better than Liao's. For smaller N, Liao's approach 
did slightly better but perhaps not significantly so. 

The QMC-MCMC methods also reduce the variance of the estimated pos- 
terior means for the latent parameters Zi, . . . , Z^g, sometimes by very large 
amounts. When \Zi\ is large then the variance reduction for it is nearly the 
same as we see for the coefficients f3j. When \Zi\ is small, corresponding to 
cases near the borderline, then variance reductions of several hundred fold are 
attained. 

Figure 3 plots the variance reductions for latent parameters versus the esti- 
mated values of those latent parameters. The curves corresponding to the largest 
and smallest sample sizes are shown. The curves for the other sample sizes are 
qualitatively similar. The LCG version attains some much larger variance re- 



S. D. Tribble and A. B. Owen/Weakly CUD sequences for MCMC 



657 




Posterior mean of latent value Z 



Fig 3. This figure shows the sampling variance of MCMC estimates of E{Zi \ Yi, . . . , ¥39) for 
the Finney data. The posterior means themselves are on the horizontal axis. Their variance is 
on the vertical axis with QMC-MCMC values joined with a solid line and IID-MCMC values 
joined with a dashed line. The displayed data are for N = 16,384. 



ductions, sometimes over 500-fold, for the Zi near 0. Table 4 shows summary 
statistics of the variance reductions. 

In MCMC sampling there is usually a bias because the chain only approaches 
its stationary distribution asymptotically. Variance reductions are most mean- 
ingful when the biases of two methods are comparable and small. Because the 
sample values of all 42 parameters averaged over 300 replications are essentially 
identical for all 3 methods at every N, we are sure that the biases of all of these 
methods arc nearly identical here. Table 5 summarizes the evidence on bias. 
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Table 4 

This table summarizes variance reduction factors for the posterior means of the latent 
parameters Zi, pooling all 5 run lengths N. The min, max, mean and quartiles of the 
variance reduction factors are shown. 





Min 


Qo.25 


Qo.5 


Mean 


Qo.75 


Max 


LCG 


10.0 


23.5 


38.9 


68.1 


67.9 


561.6 


Liao 


11.1 


18.6 


22.5 


37.5 


48.4 


157.0 



Table 5 

This table summarizes parameter differences, averaged over replications, between the 
sampling methods. All parameters and all sample sizes are included. The first row compares 
LCG-MCMC to IID-MCMC. The second row compares permuted QMC to IID sampling. 
The third row compares two versions of QMC-MCMC. Corresponding mean and median 
values (not shown) are all within the range ±1.5 X 10~*. 





Min 


Qo.25 


Qo.75 


Max 


LCG - 


MC 


-0.0053 


-0.00072 


0.00053 


0.0073 


Liao — 


MC 


-0.0081 


-0.00068 


0.00049 


0.0087 


Liao — 


LCG 


-0.0029 


-0.00017 


0.00022 


0.0014 



8. Discussion 

This paper has produced some specific constructions of WCUD sequences, has 
given general methods that convert WCUD sequences into other WCUD se- 
quences, and has found conditions that simplify the task of proving that a 
sequence is WCUD. 

Some further results appear in the thesis of Tribble (2007). In particular, Tribble 
(2007) establishes results parallel to the ones here, for methods that use small 
feedback shift register generators instead of small congruential generators. Tribble 
(2007) also introduces a skipping method that simplifies the process of running 
through all s-tuplcs of a small random number generator. 

Our motivating application for studying (W)CUD sequences is for MCMC, 
especially in continuous state spaces. The sequences we construct take place in 
a continuous space, and the transformations we apply are those for continuous 
random variables. For technical reasons, we have analyzed the methods for dis- 
crete state spaces. We expect that some further though hopefully mild regularity 
will be required. For now, it is encouraging that nothing seemed to go awry in 
the continuous example that we ran. 

Our analysis of Liao's shuffling proposal shows that it only improves the rate 
of convergence for one dimensional discrepancies. This fact suggests that his 
proposal will affect the constant, but not ordinarily the rate in the MCMC con- 
vergence. The numerical results appear to bear this out. It is however surprising 
to see as much as a 20 fold variance reduction from a method that only improves 
certain one dimensional histograms of the input sequence. This must mean that 
those one dimensional aspects are relatively important compared to high di- 
mensional and more subtle features. Such a pheomenon has been seen before in 
finite dimensional applications of QMC. The effective dimension can be much 
smaller than the nominal one as described in Cafiisch et al. (1997). Of course 
not all integrands have low effective dimension and we would not expect large 
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variance reductions every time MCMC was applied. Furthermore the posterior 
moments we studied are smooth functions of their arguments and this plays to 
a strength of QMC. 

The LCG scheme by contrast shows steady improvement with increasing 
sample size, though we have no theory that applies to the rate of convergence, 
and no reason to expect that better than the MC rate can be attained for MCMC 
problems. Against the possibility of better LCG performance there is a tradeoff. 
Liao's method is very simple to use and LCGs require parameter searches. 
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